Comparison of relational methods, attribute-based methods and hybrid methods
نویسندگان
چکیده
Most of the data mining methods in real-world intelligent systems are attribute-based machine learning methods such as neural networks, nearest neighbors and decision trees. They are relatively simple,efficient, and can handle noisy data. However, these methods have two strong limitations: (1) the background knowledge can be expressed in rather limited form and (2) the lack of relations other than “object-attribute” makes the concept description language inappropriate for some applications. Relational and hybrid data mining methods based on first-order logic are compared with Neural Networks and other benchmark methods on different data sets. These computational experiments show several advantages of relational and hybrid methods. 1. Problem definition and objectives Relational Data Mining (RDM) combines inductive logic programming (ILP) with probabilistic inference. The combination benefits from noise robust probabilistic inference and highly expressive and understandable firstorder logic rules employed in ILP. Data mining has two major sources to infer rules: database and machine learning technologies. The field of machine learning is concerned with the question of how to construct computer programs that automatically improve with experience. In recent years many successful machine learning applications have been developed, ranging from data-mining programs that learn to detect fraudulent credit card transactions, to information-filtering systems that learn users’ reading preferences, to autonomous vehicles that learn to drive on public [6,8] Currently statistical and Artificial Neural Network methods dominate in design of intelligent systems and data mining. Alternative relational (symbolic) machine learning methods had shown their effectiveness in robotics (navigation, 3-dimensional scene analysis) and drug design (selection of the most promising components for drug design). Traditionally symbolic methods are used in the areas with a lot of non-numeric (symbolic) knowledge. In robot navigation this is relative location of obstacles (on the right, on the left and so on). We discuss the key algorithms and theory that form the core of symbolic machine learning methods for applications with dominating numerical data. Relational Data Mining (RDM) technology is a data modeling algorithm that does not assume the functional form of the relationship being modeled a priori. It can automatically consider a large number of inputs (e.g., time series characterization parameters) and learn how to combine these to produce estimates for future values of a specific output variable. Most of the data mining methods are attribute-based machine learning methods such as neural networks, nearest neighbors and decision trees. They are relatively simple, efficient, and can handle noisy data. However, these methods have two strong limitations: (1) the background knowledge can be expressed in rather limited form and (2) the lack of relations makes the concept description language inappropriate for some domains [1]. The purpose of a new area of machine learning called Inductive Logic Programming (ILP) is to overcome these limitations. Logic programming provided the solid theoretical basis for ILP. On the other hand at Kovalerchuk, B., Vityaev E., Comparison of relational methods and attribute-based methods for data mining in intelligent systems, 1999 IEEE Int. Symposium on Intelligent Control/Intelligent Systems, Cambridge, Mass, 1999. pp. 162-166. Working version present existing ILP systems are relatively inefficient and have rather limited facilities for handling numerical data [1]. We developed a hybrid ILP and probabilistic technique that handles numerical data efficiently [3,9]. One of the main advantages of ILP over attribute-based learning is ILP’s generality of representation for background knowledge. This enables the user to provide, in a more natural way, domain-specific background knowledge to be used in learning. The use of background knowledge enables the user both to develop a suitable problem representation and to introduce problem-specific constraints into the learning process. By contrast, attributebased learners can typically accept background knowledge in rather limited form only [1]. 2. Comparison of problem requirements and method capabilities Dhar and Stein [2] introduced a unified vocabulary for matching computational intelligence problems and methods. A problem is described using a set of requirements (problem ID profile). A method is described using its capabilities in the same terms. In [2] this vocabulary was applied for describing and comparing several data mining methods. Neural Networks (NN) are the most common methods in data mining. There are three shortages of NN for forecasting related to: (1) explainability, (2) use of logical relations and (2) tolerance for sparse data. Table 1 presents wider comparison of different data mining methods including Neural Networks. 3. Relational methods A machine learning type of method, called Machine Methods for Discovering Regularities (MMDR) is applied for forecasting time series. The method expresses patterns in first order logic and assigns probabilities to rules generated by composing patterns. Currently the majority of learning systems for applications concentrate on neural networks, genetic algorithms, and related techniques. In practice, learning systems based on first-order representations have been successfully applied to many problems in chemistry, physics, medicine and other fields [1,6]. As any technique based on first order logic, MMDR allows one to get human-readable forecasting rules [1,6,8], i.e. understandable in ordinary language in addition to the forecast. A field expert can evaluate the performance of the forecast as well as a forecasting rule. Also, as any technique based on probabilistic estimates, this technique delivers rules tested on their statistical significance. Statistically significant rules have advantage in comparison with rules tested only for their performance on training and test data [6, ch. 5]. Training and testing data can be too limited and/or not representative. If rules rely only on them then there are more chances that these rules will not deliver a right forecast on other data. What is the motivation to use suggested MMDR method in particular? MMDR uses hypothesis/rule generation and selection process, based on fundamental representative measurement theory [5]. The original challenge for MMDR was the simulation of discovering scientific laws from empirical data in chemistry and physics. There is a well-known difference between “black box” models and fundamental models (laws) in modern physics. The last ones have much longer life, wider scope and a solid background. In this paper we study several types of hypotheses/rules presented in first-order logic. They are simple relational assertions with variables. Mitchell [6] noted the importance that relational assertions “can be conveniently expressed using first-order representations, while they are very difficult to describe using propositional representations” (pp.275, 283-284). Many well-known rule learners such as AQ, CN2 are propositional [6,7]. Note that decision tree methods represent a particular type of propositional representation [6, p.275]. Therefore decision tree methods as ID3 and its successor C4.5 fit better to tasks without relational assertions. Mitchell argues and gives examples that propositional representations offer no general way to describe the essential relations among the values of the attributes [6, pp. 283-284]. Below we follow his example. In contrast with propositional rules, a program using first-order representations could learn the following general rule: IF Father(x,y) & Female(y), THEN Daugher(x,y), where x and y are variables that can be bound to any person. For the target concept Daughter1,2 propositional rule learner such as CN2 or C4.5, the result would be a collection of very specific rules such as IF (Father1=Bob)&Name2=Bob)&Female1=True) THEN Daughter1,2=True. Although it is correct, this rule is so specific that it will rarely, if ever, be useful in classifying future pairs of people [6, pp.283-284]. We show that the close problem exists for ARIMA and Neural Networks methods. First-order logic rules have an advantage in discovering relational assertions because they capture relations directly, e.g., Father(x,y) in the example above. In addition, first order rules allow one to express naturally other more general hypotheses not only the relation between pairs of attributes [3]. These more general rules can be as for classification problems as for an interval forecast of continuous variable. Moreover these rules are able to catch Markov chain type of models used for time series forecast. We share Mitchell’s opinion about the importance of algorithms designed to learn sets of first-order rules that contain variables. “This is significant because first-order rules are much more expressive than propositional rules” [4, p.274]. What is the difference of other machine learning methods dealing with first-order logic [4, 5] and MMDR? From our viewpoint the main accent in other first-order methods [4, 5] is on two computational complexity issues: how wide is the class of hypotheses tested by the particular machine learning algorithms and how to construct a learning algorithm to find deterministic rules. The emphasis of MMDR is on probabilistic first-order rules and measurement issues, i.e., how we can move from a real measurement to first-order logic representation. Note that recently Muggleton’s team moved to the same probabilistic direction. This is a non-trivial task [3]. For example, how to represent temperature measurement in terms of first-order logic without losing the essence of the attribute (temperature in this case) and without inputting unnecessary conventional properties? For instance, Fahrenheit and Celsius zeros of temperature are our conventions in contrast with Kelvin scale where the zero is a real physical zero. There are no temperatures less than this zero. Therefore incorporating properties of the Fahrenheit zero into first-order rules may force us to discover/learn properties of this convention along with more significant scale invariant forecasting rules. Learning algorithms in the space with those kind of accidental properties may be very time consuming and may produce inappropriate rules. It is well known that the general problem of rule generating and testing is NP-complete. Therefore the discussion above is closely related to the following questions. What determines the number of rules and when to stop generating rules? What is the justification for specifying particular expressions instead of any other expressions? Using the approach from [3] we select rules which are simplest and consistent with measurement scales for a particular task. The algorithm stops generating new rules when they become too complex (i.e., statistically insignificant for the data) in spite of possible high accuracy on training data. The obvious other stop criterion is time limitation. Detailed discussion about a mechanism of initial rule selection from measurement theory [3] viewpoint is out of the scope of this paper. A special study may result in a catalogue of initial rules/hypotheses to be tested (learned) for particular applications. In this way any field analyst can choose rules to be tested without generating them. This paper delivers a preliminary list of rules for that catalogue. The critical issue in applying data-driven forecasting systems is generalization. The "Discovery" system generalizes data through “lawlike” logical probabilistic rules. Discovered rules have similar statistical estimate and significance on training and test sets of studied time series. Theoretical advantages of MMDR generalization are presented in [6, 2]. We use mathematical formalisms of first order logic rules described in [5, 3]. 4 Method for discovering regularities Figure 1 describes the steps of MMDR. On the first step we select and/or generate a class first–order logic rules suitable for a particular task. The next step is learning the particular first-order logic rules using available training data. Then we test first-order logic rules on training and test data using Fisher statistical criterion. After that we select
منابع مشابه
Graph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملA Hybrid Multi-attribute Group Decision Making Method Based on Grey Linguistic 2-tuple
Because of the complexity of decision-making environment, the uncertainty of fuzziness and the uncertainty of grey maybe coexist in the problems of multi-attribute group decision making. In this paper, we study the problems of multi-attribute group decision making with hybrid grey attribute data (the precise values, interval numbers and linguistic fuzzy variables coexist, and each attribute val...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملEvaluation of hybrid fuzzy regression capability based on comparison with other regression methods
In this paper, the difference between classical regression and fuzzy regression is discussed. In fuzzy regression, nonphase and fuzzy data can be used for modeling. While in classical regression only non-fuzzy data is used. The purpose of the study is to investigate the possibility of regression method, least squares regression based on regression and linear least squares linear regression met...
متن کاملAggregate-join Query processing in parallel database systems - High Performance Computing in the Asia-Pacific Region, 2000. Proceedings. The Fourth International
Queries containing aggregate functions o f e n combine multiple tables through join operations. We call these queries "Aggregate-Join" queries. In parallel processing of such queries, it must be decided which attribute to be used as a partitioning attribute, particularly join attribute or group-by attribute. Based on the partitioning attribute, we discuss three parallel aggregate-join query pro...
متن کاملA Hybrid Grey-Game-MCDM Method for ERP Selecting Based on BSC
An enterprise resource planning (ERP) software is needed for industries and companies that want to develop in future. Many of the manufactures and companies have a problem with ERP software selection. An inappropriate selection process can affect both the implementation and the performance of the company significantly. Although several models are proposed to solve this problem many of them did n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003